data(speech)Practical session 3
NHMRC Clinical Trials Centre, University of Sydney
gillian.heller@sydney.edu.au
mixed discrete-continuous
continuous with probability mass at discrete point(s)
aka semi-continuous
On \mathbb{R}_+ with a probability spike at zero
daily rainfall
amount of insurance claim in a year
annual medical expenditure
f(y \mid \boldsymbol{\theta}, \pi)= \begin{cases}\pi & \text { if } y=0 \\ (1-\pi) f_c(y \mid \boldsymbol{\theta}) & \text { if } y>0\end{cases}
A patient-reported measure of pain is the Visual Analogue Scale (VAS)
Subjects are given a linear scale and asked to put a mark where they perceive themselves
Responses are measured and scaled to [0, 1]
While 1 is not frequently recorded, 0 is common
f\left(y \mid \boldsymbol{\theta}, \pi_0, \pi_1\right)= \begin{cases}\pi_0 & \text { if } y=0 \\ \left(1-\pi_0-\pi_1\right) f_c(y \mid \boldsymbol{\theta}) & \text { if } 0<y<1 \\ \pi_1 & \text { if } y=1\end{cases}
| Distribution | GAMLSS name | Parameters |
|---|---|---|
| zero-adjusted inverse Gaussian | ZAIG |
\mu, \sigma, \nu |
| zero-adjusted Gamma | ZAGA |
\mu, \sigma, \nu |
| zero-inflated beta | BEINF0, BEZI |
\mu, \sigma, \nu |
| one-inflated beta | BEINF1, BEOI |
\mu, \sigma, \nu |
| zero- and one-inflated beta | BEINF |
\mu, \sigma, \nu, \tau |
Speech intelligibility tests are conducted on hearing-impaired people, for the purpose of evaluating the performance of hearing devices under controlled listening conditions in a sound laboratory.
The subject listens to a short sentence (of length N words), mixed with noise, and repeats it back.
The number of correctly identified words: w
Response: proportion correct y=w/N
| Variable | Description |
|---|---|
N |
number of words in sentence |
w |
number of words correctly identified |
y |
proportion correct (w/N) |
snr |
signal-to-noise ratio |
algorithm |
sound processing algorithm (A, B, C) |
noise_dir |
noise direction (F = front, S = side) |
noise_gender |
noise gender (M, F) |
subject |
subject identifier |
Bounded continuous
Zero- and one-inflated
Recall: lecture 2, slides 31-32
If f_c(y) is e.g. the Beta distribution which has 2 parameters \mu and \sigma
f\left(y \mid \mu, \sigma, \pi_0, \pi_1\right)= \begin{cases}\pi_0 & \text { if } y=0 \\ \left(1-\pi_0-\pi_1\right) f_c(y \mid \mu, \sigma) & \text { if } 0<y<1 \\ \pi_1 & \text { if } y=1\end{cases}
In estimating the model, we have to respect the constraint
0<\hat\pi_{0i}+\hat\pi_{1i}<1\qquad\text{for }i=1,\ldots,n
f(y \mid \mu, \sigma, \xi_0, \xi_1)= \begin{cases}\frac{\xi_0}{1+\xi_0+\xi_1} & y=0 \\ \frac{1}{1+\xi_0+\xi_1} \cdot f_c(y \mid \mu, \sigma) & y \in(0,1) \\ \frac{\xi_1}{1+\xi_0+\xi_1} & y=1\end{cases}
where \mu \in(0,1), \sigma \in(0,1), \xi_0>0, \xi_1>0
We choose the distribution for 0<y<1, then add zero- and one-inflation
Use the gamlss function chooseDist() to pick the distribution for y\in(0,1):
speech.01 <- subset(speech, (y>0&y<1))
m1 <- gamlss(y~snr,
sigma.formula =~ snr,
nu.formula =~ snr,
tau.formula =~ snr,
family=BE, data=speech.01, n.cyc=100, trace=FALSE)
a <- chooseDist(m1, type="real0to1") minimum GAIC(k= 2 ) family: SIMPLEX
minimum GAIC(k= 3.84 ) family: SIMPLEX
minimum GAIC(k= 7.67 ) family: SIMPLEX
Now create the zero- and one-inflated Simplex distribution
A 0to1 inflated SIMPLEX distribution has been generated
and saved under the names:
dSIMPLEXInf0to1 pSIMPLEXInf0to1 qSIMPLEXInf0to1 rSIMPLEXInf0to1
plotSIMPLEXInf0to1
Simplex distribution has 2 parameters \mu and \sigma
Zero- and one-inflation adds
zero-inflation parameter \xi_0 = xi0
one-inflation parameter \xi_1 = xi1
To fit the model use the function gamlssInf0to1()
gamlssInf0to1(y = y,
mu.formula = ~1,sigma.formula = ~1,
nu.formula = ~1, tau.formula = ~1,
xi0.formula = ~1, xi1.formula = ~1,
data = data, family = SIMPLEX, ...)
m0 <- gamlssInf0to1(y = y,
mu.formula = ~ snr,
sigma.formula = ~snr,
xi0.formula = ~ snr, xi1.formula = ~ snr,
data=speech, family = SIMPLEX, n.cyc=50)
m.step <- stepGAICAll.A(m0,
scope=list(lower=~1,
upper=~ snr + noise_dir + noise_gender + algorithm +
algorithm:snr + algorithm:noise_dir + algorithm:noise_gender))m01.0 <- gamlss(y~snr,
sigma.formula =~ snr,
family=SIMPLEX, data=speech.01, n.cyc=100, trace=FALSE)
m01.step <- stepGAICAll.A(m01.0,
scope=list(lower=~1,
upper=~ snr + noise_dir + noise_gender + algorithm +
algorithm*snr + algorithm*noise_dir + algorithm*noise_gender))---------------------------------------------------
Distribution parameter: mu
Start: AIC= -924.71
y ~ snr
Df AIC
+ noise_gender 1 -931.45
+ algorithm 2 -927.83
<none> -924.71
+ noise_dir 1 -922.71
Step: AIC= -931.45
y ~ snr + noise_gender
Df AIC
+ algorithm 2 -934.12
<none> -931.45
+ noise_dir 1 -929.45
Step: AIC= -934.12
y ~ snr + noise_gender + algorithm
Df AIC
+ noise_gender:algorithm 2 -935.78
<none> -934.12
+ noise_dir 1 -932.13
+ snr:algorithm 2 -931.59
Step: AIC= -935.78
y ~ snr + noise_gender + algorithm + noise_gender:algorithm
Df AIC
<none> -935.78
+ noise_dir 1 -933.78
+ snr:algorithm 2 -932.63
---------------------------------------------------
Distribution parameter: sigma
Start: AIC= -935.78
~snr
Df AIC
- snr 1 -937.35
<none> -935.78
+ noise_dir 1 -934.51
+ noise_gender 1 -934.14
+ algorithm 2 -932.11
Step: AIC= -937.35
~1
Df AIC
<none> -937.35
+ noise_dir 1 -936.08
+ noise_gender 1 -935.89
+ snr 1 -935.78
+ algorithm 2 -933.72
---------------------------------------------------
Distribution parameter: mu
Start: AIC= -937.35
y ~ snr + noise_gender + algorithm + noise_gender:algorithm
Df AIC
<none> -937.35
- noise_gender:algorithm 2 -935.70
- snr 1 -867.35
---------------------------------------------------
******************************************************************
Family: c("SIMPLEX", "Simplex")
Call: gamlss(formula = y ~ snr + noise_gender + algorithm +
noise_gender:algorithm, sigma.formula = ~1, family = SIMPLEX,
data = speech.01, n.cyc = 100, trace = FALSE)
Fitting method: RS()
------------------------------------------------------------------
Mu link function: logit
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.145803 0.052088 -2.799 0.00517 **
snr 0.058113 0.006755 8.603 < 2e-16 ***
noise_genderM -0.016676 0.063615 -0.262 0.79324
algorithmB -0.021673 0.064255 -0.337 0.73593
algorithmC -0.152696 0.060821 -2.511 0.01213 *
noise_genderM:algorithmB 0.149942 0.089073 1.683 0.09245 .
noise_genderM:algorithmC 0.199014 0.085822 2.319 0.02049 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Sigma link function: log
Sigma Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.82451 0.01527 54 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
No. of observations in the fit: 2144
Degrees of Freedom for the fit: 8
Residual Deg. of Freedom: 2136
at cycle: 2
Global Deviance: -953.3476
AIC: -937.3476
SBC: -891.9842
******************************************************************
\begin{align*} y_i&\sim\text{SIMPLEX}(\mu_i, \sigma_i)\\ \text{logit}(\mu_i)&=\beta_0^\mu+\beta^\mu _{\text{snr}}x_{\text{snr}, i}+&\text{snr}\\ &\qquad \beta^\mu _{\text{ng=M}}x_{\text{ng=M}, i}+&\text{noise gender}\\ &\qquad \beta^\mu _{\text{alg=B}}x_{\text{alg=B}, i}+\beta^\mu _{\text{alg=C}}x_{\text{alg=C}, i}+&\text{algorithm}\\ &\qquad \beta^\mu _{\text{MB}}x_{\text{ng=M}, i}x_{\text{alg=B}, i}+\beta^\mu _{\text{MC}}x_{\text{ng=M}, i}x_{\text{alg=C}, i}&\text{noise gender}\times\text{algorithm}\\ \log(\sigma_i) &=\beta_0^\sigma \end{align*}
| Variable | Parameter | |
|---|---|---|
| \mu | \sigma | |
| snr | \checkmark | |
| noise gender | \checkmark | |
| algorithm | \checkmark | |
| noise gender \times algorithm | \checkmark |
y=0 | 0<y<1 | y=1
MN3, MN4, MN5 for 3-, 4- and 5-level nominal responsesMN3 model\mathbb{P}(Y=y \mid \mu, \sigma)= \begin{cases}p_0 & \text { if } y=0 \\ p_1 & \text { if } y=1 \\ p_2 & \text { if } y=2 \end{cases} where
\begin{align*} p_0&=\frac{\mu}{1+\mu+\sigma}, \qquad p_1=\frac{\sigma}{1+\mu+\sigma}, \qquad \mu>0,\quad \sigma>0 \end{align*}
So we model the probabilities \mathbb{P}(y=0) and \mathbb{P}(y=1)
y=2 is the reference level
We want to model \xi_0 and \xi_1, so create a multinomial response with 0<y<1 as the reference level
m.mult.0 <- gamlss(zero_one ~ snr, family=MN3, data=speech, trace=FALSE)
m.mult.step <- stepGAICAll.A(m.mult.0,
scope=list(lower=~1,
upper=~ snr + noise_dir + noise_gender + algorithm +
algorithm*snr + algorithm*noise_dir + algorithm*noise_gender))---------------------------------------------------
Distribution parameter: mu
Start: AIC= 13763.48
zero_one ~ snr
Df AIC
+ noise_gender 1 13701
+ algorithm 2 13760
<none> 13764
+ noise_dir 1 13765
Step: AIC= 13701
zero_one ~ snr + noise_gender
Df AIC
+ algorithm 2 13698
<none> 13701
+ noise_dir 1 13703
Step: AIC= 13697.8
zero_one ~ snr + noise_gender + algorithm
Df AIC
+ snr:algorithm 2 13681
<none> 13698
+ noise_gender:algorithm 2 13699
+ noise_dir 1 13700
Step: AIC= 13681.44
zero_one ~ snr + noise_gender + algorithm + snr:algorithm
Df AIC
<none> 13681
+ noise_dir 1 13683
+ noise_gender:algorithm 2 13685
---------------------------------------------------
Distribution parameter: sigma
Start: AIC= 13681.44
~1
Df AIC
+ snr 1 13387
+ algorithm 2 13662
<none> 13681
+ noise_gender 1 13683
+ noise_dir 1 13683
Step: AIC= 13386.64
~snr
Df AIC
+ algorithm 2 13362
+ noise_gender 1 13370
<none> 13387
+ noise_dir 1 13389
- snr 1 13681
Step: AIC= 13362.52
~snr + algorithm
Df AIC
+ noise_gender 1 13346
+ snr:algorithm 2 13361
<none> 13362
+ noise_dir 1 13364
- algorithm 2 13387
- snr 1 13662
Step: AIC= 13346.09
~snr + algorithm + noise_gender
Df AIC
+ noise_gender:algorithm 2 13343
+ snr:algorithm 2 13346
<none> 13346
+ noise_dir 1 13348
- noise_gender 1 13362
- algorithm 2 13370
- snr 1 13664
Step: AIC= 13343.08
~snr + algorithm + noise_gender + algorithm:noise_gender
Df AIC
<none> 13343
+ snr:algorithm 2 13343
+ noise_dir 1 13345
- algorithm:noise_gender 2 13346
- snr 1 13664
---------------------------------------------------
Distribution parameter: mu
Start: AIC= 13343.08
zero_one ~ snr + noise_gender + algorithm + snr:algorithm
Df AIC
<none> 13343
- snr:algorithm 2 13359
- noise_gender 1 13368
---------------------------------------------------
******************************************************************
Family: c("MN3", "Multinomial 3 levels")
Call: gamlss(formula = zero_one ~ snr + noise_gender + algorithm +
snr:algorithm, sigma.formula = ~snr + algorithm +
noise_gender + algorithm:noise_gender, family = MN3,
data = speech, trace = FALSE)
Fitting method: RS()
------------------------------------------------------------------
Mu link function: log
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.12510 0.08915 12.620 < 2e-16 ***
snr -0.28711 0.02270 -12.650 < 2e-16 ***
noise_genderM -0.33376 0.06474 -5.156 2.60e-07 ***
algorithmB -0.31473 0.10549 -2.984 0.002858 **
algorithmC -0.36384 0.10786 -3.373 0.000747 ***
snr:algorithmB 0.11234 0.02791 4.025 5.76e-05 ***
snr:algorithmC 0.02403 0.02949 0.815 0.415192
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Sigma link function: log
Sigma Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.71275 0.09202 -7.745 1.10e-14 ***
snr 0.20758 0.01208 17.181 < 2e-16 ***
algorithmB -0.16630 0.10449 -1.592 0.11153
algorithmC -0.56855 0.10438 -5.447 5.31e-08 ***
noise_genderM 0.11956 0.10052 1.189 0.23433
algorithmB:noise_genderM 0.10949 0.13685 0.800 0.42370
algorithmC:noise_genderM 0.35711 0.13803 2.587 0.00969 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
No. of observations in the fit: 6720
Degrees of Freedom for the fit: 14
Residual Deg. of Freedom: 6706
at cycle: 6
Global Deviance: 13315.08
AIC: 13343.08
SBC: 13438.46
******************************************************************
| Variable | Parameter | |||
|---|---|---|---|---|
| \mu | \sigma | \xi_0 | \xi_1 | |
| snr | \checkmark | \checkmark | \checkmark | |
| noise gender | \checkmark | \checkmark | \checkmark | |
| algorithm | \checkmark | \checkmark | \checkmark | |
| noise gender \times algorithm | \checkmark | \checkmark | ||
| snr \times algorithm | \checkmark |
try(pe_pdf(m.overall, term="algorithm",
scenario = list("snr"=5, "noise_gender"="F"),
x.values = c("A", "B", "C")))Error in pe_pdf(m.overall, term = "algorithm", scenario = list(snr = 5, :
Supply a standard GAMLSS model in obj
Again we run into trouble because we are using a user-defined distribution.
distreg.vis also won’t work for a user-defined distribution.
See https://gamlssbook.bitbucket.io/Chapter9.html for a different approach.
Practical 3